Hands-on Exercise 10 - Part 1: Processing and Visualising Flow Data

Author

Lorielle Malveda

Published

October 30, 2024

Modified

November 4, 2024

1. OVERVIEW

Spatial Interaction represents the flow of people, material, or information between locations in a geographical space. This concept includes everything from freight transportation, energy flows, and international trade of rare items, to flight patters, peak-hour traffic, and pedestrian activity.

As an analogy for a set of movements, each spatial interaction is composed of a discrete origin/destination pair. In this matrix, rows correspond to origin locations (centroids), and columns correspond to destination locations (centroids), forming what is called an origin/destination matrix or spatial interaction matrix.

In this exercise, we are going to build OD matrix using Passenger Volume by Origin-Destination Bus Stops data from the LTA DataMall.

Goals are to:

  • Import and filter OD data for a specified time period,
  • Import and store geospatial data (e.g., bus stops and MPSZ) in sf tibble data frames,
  • Add planning subzone codes to bus stops sf tibble data frames,
  • Generate desire line geospatial data from OD data, and
  • Visualize passenger volumes between origin and destination bus stops using desire lines data.

2. GETTING STARTED

For this exercise, we are going to use five essential R packages:

  1. sf – For importing, integrating, processing, and transforming geospatial data.
  2. tidyverse – For importing, integrating, wrangling, and visualizing data.
  3. tmap – For creating elegant, high-quality thematic maps suitable for cartographic display.
  4. stplanr – Offers functions for common transport planning and modeling tasks, such as downloading and cleaning transport datasets; generating geographic “desire lines” from origin-destination (OD) data; assigning routes locally and interfacing with routing services like CycleStreets.net; calculating route segment attributes like bearing and aggregated flow; and conducting ‘travel watershed’ analysis.
  5. DT – Provides an interface to the JavaScript library DataTables, enabling R data objects (matrices or data frames) to be displayed as interactive HTML tables with features like filtering, pagination, and sorting.
pacman::p_load(tmap, sf, DT, stplanr, tidyverse)

3. PREPARING THE FLOW DATA

3.1 Importing the OD Data

First thing to do is to import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of the readr package.

odbus <- read_csv("data/aspatial/origin_destination_bus_202210.csv")

Next, let’s use glimpse() to see the odbus tibble data table by using the code chunk below.

glimpse(odbus)
Rows: 5,122,925
Columns: 7
$ YEAR_MONTH          <chr> "2022-10", "2022-10", "2022-10", "2022-10", "2022-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 10, 10, 7, 11, 16, 16, 20, 7, 7, 11, 11, 8, 11, 11…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <dbl> 65239, 65239, 23519, 52509, 54349, 54349, 43371, 8…
$ DESTINATION_PT_CODE <dbl> 65159, 65159, 23311, 42041, 53241, 53241, 14139, 9…
$ TOTAL_TRIPS         <dbl> 2, 1, 2, 1, 1, 4, 1, 3, 1, 5, 2, 5, 15, 40, 1, 1, …

Based on the results above, the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type.

The code chunk below is used to convert these data values into the character data type.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

3.2 Extracting the Study Data

For the purpose of this exercise, we will extract commuting flows on weekdays and between 6 and 9 o’clock.

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

Let’s look at the table below to see the contents.

datatable(odbus6_9)

We will save the output in rds format for future use.

write_rds(odbus6_9, "data/rds/odbus6_9.rds")

The code chunk below will be used to import the saved odbus6_9.rds into the R environment.

odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

4. WORKING WITH GEOSPATIAL DATA

For this exercise, two geospatial datasets will be used:

  1. BusStop: This dataset provides the locations of bus stops as of the last quarter of 2022.

  2. MPSZ-2019: This dataset contains the sub-zone boundaries from the URA Master Plan 2019.

Both datasets are available in ESRI shapefile format.

4.1 Importing Geospatial Data

Let’s import the 2 datasets.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\loriellemalveda\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10_01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\loriellemalveda\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10_01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
Note
  • st_read() function of sf package is used to import the shapefile into R as an sf data frame.

  • st_transform() function of sf package is used to transform the projection to crs 3414.

The following code chunk saves the mpsz sf tibble data frame as an RDS file for future use.

mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

5. GEOSPATIAL DATA WRANGLING

5.1 Combining Busstop and mpsz

The code chunk below populates the planning subzone code (SUBZONE_C) from the mpsz sf data frame into the busstop sf data frame.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Note
  • st_intersection() is used to perform point and polygon overly and the output will be in point sf object.

  • select() of dplyr package is then use to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame.

  • five bus stops are excluded in the resultant data frame because they are outside of the Singapore boundary.

datatable(busstop_mpsz)

Before moving to the next step, it is wise to save the output into rds format.

write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")  

Next, we will append the planning subzone code from the busstop_mpsz data frame to the odbus6_9 data frame.

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)

Before we continue, it is good practice to check for duplicates.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

Once we figure out the duplicates, let’s make sure to retain only the unique records.

od_data <- unique(od_data)

It is also good practice to recheck duplicates’ issue has been addressed fully.

Next, we will update the od_data data frame with the planning subzone codes.

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
od_data <- unique(od_data)
od_data <- od_data %>%
  rename(DESTIN_SZ = SUBZONE_C) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))

Do not forget to save the output into an rds file format!

write_rds(od_data, "data/rds/od_data_fii.rds")
od_data_fii <- read_rds("data/rds/od_data.rds")

6. VISUALIZING SPATIAL INTERACTION

In this section, you will learn how to create a desire line using the stplanr package.

6.1 Removing Intra-zonal Flows

We will not plot the intra-zonal flows. The code chunk below will be used to remove these intra-zonal flows.

od_data_fij <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
write_rds(od_data_fij, "data/rds/od_data_fij.rds")
od_data_fij <- read_rds("data/rds/od_data_fij.rds")

6.2 Creating Desire Lines

In the code chunk below, the od2line() function from the stplanr package is used to generate desire lines.

flowLine <- od2line(flow = od_data_fij, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
write_rds(flowLine, "data/rds/flowLine.rds")
flowLine <- read_rds("data/rds/flowLine.rds")

6.3 Visualizing the Desire Lines

The code chunk below is used to visualize the resulting desire lines.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)

Warning

Be patient, the rendering process takes more time because of the transparency argument (i.e. alpha)

When flow data are messy and highly skewed, as shown above, it’s often more effective to focus on selected flows, such as those greater than or equal to 5000, as demonstrated below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)